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Abstract 

Background: Amycolatopsis orientalis is the type species of tine genus and its industrial strain HCCB10007, derived 
from ATCC 43491, fias been used for large-scale production of the vital antibiotic vancomycin. However, to date, 
neither the complete genomic sequence of this species nor a systemic characterization of the vancomycin 
biosynthesis cluster (vcm) has been reported. With only the whole genome sequence of Amycolatopsis mediterranei 
available, additional complete genomes of other species may facilitate /nfra-generic comparative analysis of the genus. 

Results: The complete genome of A orientalis HCCB10007 comprises an 8,948,591 -bp circular chromosome and a 
33,499-bp dissociated plasmid. In total, 8,121 protein-coding sequences were predicted, and the species-specific 
genomic features of A. orientalis were analyzed in comparison with that of A. mediterranei. The common characteristics 
of Amycolatopsis genomes were revealed via Intra- and /nfer-generic comparative genomic analyses within the domain 
of actinomycetes, and led directly to the development of sequence-based Amycolatopsis molecular chemotaxonomic 
characteristics (MCCs). The chromosomal core/quasi-core and non-core configurations of the A. orientalis and the 
A. mediterranei genome were analyzed reciprocally, with respect to further understanding both the discriminable 
criteria and the evolutionary implementation. In addition, 26 gene clusters related to secondary metabolism, including 
the 64-kb vcm cluster, were identified in the genome. Employing a customized PCR-targeting-based mutagenesis 
system along with the biochemical identification of vancomycin variants produced by the mutants, we were able to 
experimentally characterize a halogenase, a methyltransferase and two glycosyltransferases encoded in the vcm cluster. 
The broad substrate spectra characteristics of these modification enzymes were inferred. 

Conclusions: This study not only extended the genetic knowledge of the genus Amycolatopsis and the biochemical 
knowledge of vcm-related post-assembly tailoring enzymes, but also developed methodology useful for in vivo studies 
in A. orientalis, which has been widely considered as a barrier in this field. 
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Background 

Amycolatopsis orientalis is a Gram-positive filamentous 
actinomycete that produces vancomycin (Figure 1), which 
is a potent glycopeptide antibiotic that has been used for 
more than three decades for the treatment of serious 
methicillin-resistant Staphylococcus aureus (MRSA) infec- 
tions [1]. However, the reports of increased emergence of 
vancomycin-resistant S. aureus (VRSA) and vancomycin- 
resistant enterococci (VRE) in recent years have presented 
an urgent challenge to human health, which requires the 



development of new antibiotics against these pathogens 
[2-5]. Although some semisynthetic lipoglycopeptide anti- 
biotics, such as telavancin, oritavancin and dalbavancin 
have been developed recently and their anti-VRSA activ- 
ities proved in vitro [6], the in vivo potency of these 
antibiotics is yet to be demonstrated specifically by 
clinical studies. Thus, further discovery and develop- 
ment of new glycopeptide type drug candidates con- 
tinues to be an important mission for biologists and 
organic chemists. 
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Figure 1 Morphological differentiation of mycelia in Amycolatopsis orientalis HCCB10007 and chemical structures of vancomycin 
variants. Scanning electron micrograph of A orientalis iHCCBlOOO? cultured for one or three incubation days (upper left of the panel). The red 
arrow indicates the sporulation of A orientalis cultured for three days. The core structural formula proposed for vancomycin and its variants 
(upper right of the panel) shows minor modifications of the heptapeptide core of vancomycin. Table below shows the specific formulae and 
radical compositions of each vancomycin variant compounds. Alphabetic numbering in the table are corresponding to the legend of Figure 6. 
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The gene clusters responsible for the biosynthesis of 
chloroeremomycin (cep) in A. orientalis A82846 and bal- 
himycin [bal) in Amycolatopsis balhimycina DSM 5908, 
both of which possess an identical heptapeptide back- 
bone and similar antibiotic activities compared with vanco- 
mycin, were completely {cep, NCBI accession number: 
AJ223998, AJ223999, and AL078635) or partially (bal, 
NCBI accession number: Y16952) sequenced and anno- 
tated about 10 years ago [7,8]. Thereafter, a series of genes 
from the two clusters, especially those encoding the post- 
assembly tailoring enzymes involved in chlorination [9], 
glycosylation [10-13], and methylation [14], were charac- 
terized sequentially. For example, the crystal structures 
of the TDP-epi-vancosaminyltransferase, GtfA [12]; the 
UDP-glucosyltransferase, GtfB [11]; and the glycopeptide 
N-methyltransferase, MtfA [14] from A. orientalis A82846 
were resolved. The methylation function of MtfA from A. 
orientalis A82846 in the synthesis of glycopeptide antibi- 
otics was studied in Streptomyces toyocaensis [14], and the 
halogenase activity of BhaA from A. balhimycina DSM 
5908 was verified in vivo [9]. Of the enzymes encoded 
by the biosynthetic gene cluster of vancomycin {vcm) in 
A. orientalis ATCC 43491, in vitro experiments demon- 
strated that GtfE is responsible for the addition of 
D-glucose to the hydroxyl of 4-hydroxyphenylglycine and 
that GtfD can transfer the L-vancosamine moiety to vari- 
ant glucosyl-peptides as its substrates [10,15]. In fact, as 
early as 1997, the A. orientalis GtfE was expressed in 
S. toyocaensis, and a hybrid glycopeptide antibiotic, namely 
glucosyl A47934, was produced [16]. However, unlike 
A. balhimycina, A. orientalis is not amenable for genetic 
manipulations because of difficulties encountered in DNA 
transformation [17]. Therefore, most genes of the vcm 
cluster have been characterized by heterologous expres- 
sion or in vitro enzymatic/structural analysis [15,18,19], 
with little in vivo data reported. 

To date, the DNA sequences, along with their annota- 
tion information, have been provided for vcm cluster 
genes including those encoding the monooxygenases 
(NCBI accession number: AF486630.1, FJ532347.1), the 
halogenase (NCBI accession number: FJ532347.1), the 
glycosyltransferases (NCBI accession number: U84350.1) 
and the vancomycin-resistance proteins (NCBI accession 
number: AF060799.1). Of these, the functions of the 
monooxygenase (OxyB [19]), glycosyltransferases (GtfE 
and GtfD [10,15,16]), and the vancomycin-resistance 
proteins (VanHAX [20,21]) have been well characterized. 
We cloned and sequenced the whole vcm gene cluster in 
2010 (NCBI accession number: HQ679900.1). However, 
with the exception of the glycosyltransferases and their 
encoding genes [10,15], other post- assembly tailoring en- 
zymes encoded in the vcm cluster, such as the halogenase 
and the methyltransferase, have been barely experimen- 
tally characterized so far and their functions are only 



assumed based on the similarity of the proteins to those 
encoded by bal or cep [9,14]. 

The complete genome sequences of the rifamycin pro- 
ducers Amycolatopsis mediterranei [22,23] not only revealed 
the special genomic features of the genus Amycolatopsis, 
but also confirmed it as a clade of rare actinomycetes 
potentially rich in antibiotic production capabilities. How- 
ever, although three draft datasets for the genomes of 
A. orientalis subsp. orientalis were released recentiy [22,23], 
neither the annotation nor genomic analysis for these gly- 
copeptide antibiotic-producing Amycolatopsis strains is 
available to date, particularly, at the level of the complete 
genome sequence. Here, we report the whole genome 
sequence of an industrial strain (HCCB10007) of A. 
orientalis (CP003410 and CP003411). This strain pro- 
duces high yields of vancomycin, and is derived from 
the species type strain ATCC 43491 through series of 
physical and chemical mutageneses. The high-quality 
complete genome sequence of A. orientalis was com- 
pared intra- and /«fer-generically to those of its close 
or distant phylogenetic relatives within the domain of ac- 
tinomycetes to characterize species-specific and genus- 
common features of the genomes. Moreover, functions of 
the predicted halogenase and methyltransferase of the 
vcm cluster in A. orientalis were characterized via robust 
spectroscopic analyses in the corresponding site-specific 
mutants, generated by a customized homologous recom- 
bination mutation method. 

Results and discussion 

General and species-specific features of the complete 
A. orientalis genome 

The genome of A orientalis HCCB10007 comprises two repli- 
cons (Figure 2), a large circular chromosome (8,948,591 bp) 
and a small, dissociated circular plasmid (33,499 bp). The 
same circular chromosomal topology with that of A. medi- 
terranei U32 [24] and>l. mediterranei S699 [25,26], which 
are the other two complete genomes of the Amycolatopsis 
genus currently available, implies that this is a common 
topological feature that differs from the Streptomyces 
linear chromosomes [27]. The genome of A. orientalis 
HCCB10007 is much smaller (1.3 Mbp) than that of A. 
mediterranei, and only 8,121 protein-coding sequences 
(CDSs) were predicted, which is approximately 1,100 
fewer CDSs than those identified in the genome of A. 
mediterranei (Table 1). The difference is mainly accounted 
for ~1.1 Mbp shorter in the length of the non-core regions 
of A. orientalis. Furthermore, this difference is also en- 
hanced to a certain extent (about 0.2 Mbp) by the smaller 
average size of the intergenic region (IR) both in the core 
and the non-core regions of the A. orientalis genome 
(Table 1), resulting in a more compact arrangement of 
genes (coding density of 90.4%) compared with that of 
A. mediterranei (89.1-89.3%). 
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Figure 2 Genome atlas of the A. orientalis and gene clusters for secondary metabolism. The large circle represents the chromosome: the 
outer scale is numbered in megabases and indicates the core (red), quasi-core (orange), and non-core (sky blue) regions. The circles are 
numbered from the outside in. The genes in circles 1 and 2 (forward and reverse strands, respectively) are color-coded according to COG 
functional categories. Circle 3 shows selected essential genes (cell division, replication, transcription, translation, and amino-acid metabolism; the 
paralogs of essential genes in the non-core regions are not included). Circle 4 shows the secondary metabolic clusters, which are further 
enlarged outside the circle for detailed illustration. The vcm cluster is further illustrated in Figure 6. Circle 5 depicts the RNAs (blue, tRNA; 
red, rRNA). Circle 6 shows the mobile genetic elements (transposase, phage). Circle 7 depicts the GC content. Circle 8 shows the GC bias 
(pink, values >0; green, values <0). The small circle on the right side represents the plasmid DNA sequence. The outer scale is numbered 
in kilobases. All of the genes, regardless of the forward or reverse strands, are illustrated in the same circle. Circles 2 and 3 are the same 
as circles 7 and 8 of the large chromosome, respectively. 



Table 1 General features of the Amycolatopsis genomes 



Species 




A. orientalis 


A. mediterranei 




Strain 




HCCB10007 


U32 




S699 


Length (bp) 




8,948,591 


10,236,715 


1 0,246,920 


Core/quasi-core vs Noncore (Mbp) 




5.9 3.0 


6.1 4.1 


6.1 


4.1 


GC content 




69.0% 


71.3% 


71.3% 


Core/quasi-core vs Noncore 




69.0% 69.0% 


71.1%) 71.6% 


71.1% 


71.6%) 


ORFs 


Total 


8,121 


9,228 




9,227 


Core/quasi-core vs Noncore 




5,624 2,497 


5,627 3,601 


5,630 


3,597 




Proteins with function assigned 


5,518 (67.9%)) 


6,441 (69.8%)) 


6,38 


6 (69.2%) 




Hypothetical proteins with unknown function 


2,603 (32.1%)) 


2,787 (30.2%)) 


2,841 (30.8%) 




Average ORF size (bp) 


996 


990 




989 


Core/quasi-core vs Noncore 


Average Intergenic region size (bp) 


955 1,089 
105 


966 1,027 
119 


965 


1,027 

121 


Core/quasi-core vs Noncore 


Coding density (%) 


101 116 

90.4% 


115 1 26 
89.3% 


118 


126 

39.1% 


Core/quasi-core vs Noncore 




90.5%) 90.3%) 


89.4%) 89.1% 


89.0% 


89.1% 


RNA 


rRNA operons 


4 


4 




4 


Core/quasi-core vs Noncore 




4 0 


4 0 


4 


0 




tRNA genes 


50 


52 




52 


Core/quasi-core vs Noncore 




41 9 


41 11 


41 


11 
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Initiated from oriC, the dnaA gene was chosen as the 
starting point for the numbering of the total CDSs in clock- 
wise order (Figure 2). We assigned 5,518 CDSs (67.9%) to 
known or putative functions, whereas the remaining 2,603 
CDSs (32.1%) were annotated as genes encoding hypo- 
thetical proteins (Table 1). The dissociated plasmid (desig- 
nated as pXLlOO) encodes 49 genes, of which 43 are 
functionally unknown. Similar to A. meditermnei, the A. 
orientalis genome contains four rRNA operons (16S-23S- 
5S), and their 16S RNA sequences are at the range of 97% 
identical (Additional file 1: Table SI). Comparing the four 
rRNA operons within the A. orientalis genome, the first 
two, counted clockwise from dnaA, are both transcribed 
in the forward direction and their 16S rRNA sequences 
are slightly different (98-99% identity). The second 
two are transcribed from the reverse strand and share 
identical sequences for their 16S rRNAs (Additional 
file 1: Table SI). A. orientalis has 50 tRNA genes, which 
are largely similar to those of A. mediterranei, both in the 
chromosomal location and anticodon constitutions, with 
only a few exceptions, such as one less arginine and tyro- 
sine tRNA genes and one more glutamic acid tRNA 
gene. It is worth emphasizing that, unlike A. mediterranei, 
no selenocysteine tRNA (tRNA^*"^) was found in the 
A. orientalis genome. Correspondingly, genes encoding 
selenocysteine synthase {selA), elongation factor (selB), 
and selenophosphate synthase (selD) were not found in 
the A. orientalis chromosome. Formate dehydrogenase, 
which has a selenocysteine (Sec) -encoding UGA codon 
found in the A. mediterranei genome, is also absent in 
A. orientalis. Compared with A. mediterranei, A. orientalis 
demonstrates a clearer sporulation phenotype (Figure 1). 
Although the genes responsible for this phenotypic differ- 
ence are yet to be thoroughly defined, two genes, spsF 
(AORI_0253) and spsG (AORI_0254), encoding spore coat 
proteins, were identified only in the genome of A. orienta- 
lis. In contrast, the two pMEAlOO-like integrated plasmids 
found in the A. mediterranei genomes are absent from the 
genome of A. orientalis, whereas the free plasmid pXLlOO 
present in A. orientalis HCCB10G07 is not found in any 
other sequenced Amycolatopsis strains. 

Reciprocal BLASTP was used to calculate the ortho- 
logs between A. orientalis and other related actinomy- 
cetes [A. mediterranei S699 and U32, Amycolatopsis sp. 
ATCC 39116, Saccharopolyspora erythraea, Streptomyces 
coelicolor, Saccharomonospora viridis, Nocardia farci- 
nica, and Mycobacterium tuberculosis; Additional file 1: 
Figure SI). By employing a relatively strict condition 
(identity > 30%, length coverage > 80%), A. mediterranei 
(U32 or S699) shares 50.3% of the total CDSs (4,642 or 
4,650) as orthologs with A. orientalis, which is the high- 
est among all of the comparisons of the selected actino- 
mycetes. The genome of Amycolatopsis sp. ATCC 39116 
was recendy sequenced by the DOE Joint Genome Institute 



(JGI) [28], and a high-quality draft with 11 contigs was re- 
leased in GenBank (accession no. AFWYOOOOOOOO). We 
annotated it online using the fully automated service 
RAST [29] and found that, for the 8,328 predicted CDSs, 
Amycolatopsis sp. ATCC 39116 shares 4,165 orthologs 
(50.0%) with A. orientalis (Additional file 1: Figure SI) 
S. erythraea shared 2,871 orthologs (39.9%) with A. orien- 
talis, which is the second highest number among se- 
quenced actinomycetes other than that of Amycolatopsis, 
coincides with the close phylogenetic relationship between 
the two genera {Saccharopolyspora vs. Amycolatopsis). 
Although S. viridis shares only 2,318 orthologs with 
A. orientalis, with its small chromosome (4.3 Mbp) en- 
coding 3,828 proteins, the genus Saccharomonospora 
represented by S. viridis is still considered phylogen- 
etically close to Amycolatopsis, sharing an extremely high 
percentage (approximately 60.6%) of orthologs, even higher 
than that between any two species of the same genus 
(Additional file 1: Figure SI). 

Genome configuration and plasticity of A orientalis 
compared with A. mediterranei 

The unique chromosomal configuration consisting of 
core versus non-core regions characterized by the dis- 
tinct features in gathering of essential genes {i.e., genes 
coding for functions of cell division, replication, tran- 
scription, translation, and amino-acid metabolism) in 
the corresponding genomic regions was first recog- 
nized in the linear genome of S. coelicolor [30] and then 
in the circular chromosome of S. erythraea [31]. Recently, 
a novel "quasi-core" region, with typical core charac- 
teristics, was defined within the non-core region of the 
A. mediterranei U32 genome, along with the propos- 
ition of three discriminable criteria, including the gather- 
ing of essential genes, the discrepancy in coding density of 
orthologous genes and the co-linearity of the orthologs' 
order [24]. In this study, taking the advantage of the avail- 
ability of the complete genome sequences of two species 
{A. orientalis and A. mediterranei) from the same genus, 
the chromosomal configuration of these species was ana- 
lyzed using more rigorous statistical methods, and special 
genomic plasticity related to major antibiotics produc- 
tion was revealed as probable chromosomal recombin- 
ation events. 

First, a core region of A. orientalis genome (nucleotide 
coordinates of 0-3.1 Mbp and 6.3-8.9 Mbp, corresponding 
to AORI_0001-AORI_2890 and AORI_5565-AORI_8121) 
was recognized by its good co-linearity of the order of its 
orthologs, with 14.5% of coding density for essential genes 
compared with 10.2% in non-core regions (i'<0.01). 
Meanwhile, the coding density of orthologous genes in the 
core region (68.2%) was also higher than that in the non- 
core regions (37.3%) (P<0.01) (Figure 3A, Additional 
file 1: Table S2). 
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(See figure on previous page.) 

Figure 3 Genome configurations of A. orientalis and A. mediterranei. (A) All of tlie dots in the panels were calculated in a 90-kb sliding 
window. For the broken X plot (lower right of the panel), the dots represent a reciprocal best match between the genomes of A. orientalis and 
A. mediterranei, based on the BLASTP comparison. The X-axis (Y-axis) is the nucleotide scale of the A. orientalis (A. mediterranei) chromosome. 
R1 (4.02-4.28 Mb, AORI_3663-AORI_3909) and R2 (5.55-5.75 Mbp, AORI_4997-AORI_51 73) were designated as the two quasi-core regions in the 
A. orientalis genome. Reciprocally, two regions (AMED_4864-AMED_5049 and AMED_5970-AMED_7071) were defined as the quasi-core in the 
A. mediterranei genome. The core and quasi-core regions are highlighted in lavender (A orientalis) or in pink (A. mediterranei). PI to P4 were 
designated as the regions containing biosynthesis clusters of rifamycin (rif in A. mediterranei), vancomycin (vcm in A. orientalis), NRPS (nrpsiO in 
A. mediterranei) and polyketide (pl<s9 in A. orientalis), respectively. In the upper right and lower left panels, the pink triangles represent the coding 
density of all of the genes; the turquoise squares represent the coding density of orthologs between the genomes ot A. orientalis and A. mediterranei; 
and the yellow circles represent the coding density of the essential genes. The area within the black square frame is the P2 region containing the vcm 
cluster, with a lower coding density of orthologs and essential genes. (B) Alignment of the P2 region with its flanking genes related to the vancomycin 
biosynthesis in selected actinomycete genomes. The green arrows represent the omitted genes in the corresponding genomes. (C) Alignment 
of the PI region with its flanking genes related to the rifamycin biosynthesis in selected actinomycete genomes. All of the genome data are 
available at NCBI. 



Second, different from the analysis between the ge- 
nomes of A. mediterranei and S. erythraea, two quasi- 
core regions (Rl and R2) within the non-core of the A. 
orientalis genomes are defined compared with that of A. 
mediterranei (Figure 3A). The gene orders in these two 
regions show good conservation with those of A. medi- 
terranei and the coding density of orthologous genes is 
67.0% (Rl, P<0.05) and 73.3% (R2, P<0.01), respect- 
ively, significantly higher than that of the non-core 
regions (Additional file 1: Table S2). In addition, the 
coding density of essential genes in these two regions is 
also higher than that in the non-core regions (10.2%), 
reaching 16.0% (Rl, P = 0.05) and 22.3% (R2, P < 0.05), 
respectively (employing 45-kb instead of 90-kb slid- 
ing window size for statistical analysis, as shown in 
Additional file 1: Table S2). It is worth mentioning that 
the identification of two quasi-cores is reciprocal, i.e., two 
regions (AMED_4864-AMED_5049 and AMED_5970- 
AMED_7071) can be defined as the quasi-cores in the 
A. mediterranei genome by comparison with the gen- 
ome of A. orientalis (Figure 3A), which was obviously 
unrecognized previously when the genomes of differ- 
ently related species were compared [24]. In particular, 
we noticed that all of the four rRNA operons of both 
species are located in either the core or quasi-core regions 
(Additional file 1: Table SI and Figure 2), as are 41 of the 
50 tRNA genes of A. orientalis (52 of A. mediterranei) 
containing the codons for all 20 essential amino acids 
(Table 1). 

Comparing the genome of A. orientalis with that of A. 
mediterranei, a large inversion usually known as the "X 
pattern" was revealed. Although the order of orthologs is 
well conserved between these two species, the line of the 
"X pattern" is not consecutive and is often interspersed 
with break points. Most of the break points are within 
the non-core regions encoding the majority of the sec- 
ondary metabolite biosynthesis gene clusters (Figure 2 
and Figure 3A), which might represent some horizontal 
gene transfer events. The rare break points embedded in 



the core regions, termed PI to P4, are usually the regions 
containing gene clusters for the synthesis of the "species- 
specific" secondary metabolites, i.e., rifamycin {rif) in 
PI oi A. mediterranei and vancomycin {vcm) in P2 of 
A. orientalis (Figure 3A). The P2 region in^. orientalis 
is nearly 300 kb in length. It not only contains the 64 kb 
vcm cluster, but also encodes many hypothetical pro- 
teins or predicted transcriptional regulators, and thus 
shows a relatively low coding density of orthologs and 
essential genes. In contrast, the corresponding region of 
A. mediterranei contains dozens of CDSs (over less than 
100 kb), including two gene pairs of transposase/integrase 
(AMED_1442-AMED_1443 and AMED_1452-AMED_1453; 
Figure 3B). The AMED_1452-AMED_1453 gene pair is a 
duplicate of AMED_1442-AMED_1443 with a reversed 
transcription direction, which indicates that an insertion 
might have occurred in the P2 region of an ancestral 
strain, which resulted in the acquisition of the vcm gene 
cluster in A. orientalis. Unlike the P2 flanking regions, the 
two regions flanking the rif cluster of PI are highly con- 
served among the Amycolatopsis species (Figure 3C). As 
indicated by the alignments, the n/ cluster appears to be 
inserted between two genes encoding a conserved hypo- 
thetical protein (AMED_0612) and the unique DNA- 
directed RNA polymerase (3 subunit (RpoB, AMED_0656). 
Therefore, we speculate that the ancestor of A. mediterra- 
nei may have acquired the rif cluster more recently than 
that occurred in P2. In addition, a LuxR-like tran- 
scriptional regulator (AMED_0655) is located between 
the 3' end of the rif cluster and the conserved rpoB 
gene (Figure 3C), which seems to have been acquired 
simultaneously with the rif cluster by the ancestral strain. 
A potential regulatory function of this LuxR-like protein 
in rifamycin biosynthesis is inferred, and the correspond- 
ing experimental proof is currently being pursued (unpub- 
lished data). 

In this study, intra-generic comparative genomic analysis 
of Amycolatopsis not only confirmed the core/quasi-core 
and non-core genomic configuration, but also discovered 
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certain genomic plasticity hot spots in this genus. It 
should be noted that the definition of core, quasi-core, or 
non-core regions of a genome so far remains a relative 
concept with respect to the genomes of certain species or 
genera to be compared. The choice of the sliding window 
size could also influence the characterization of the gen- 
omic configuration, which was clearly demonstrated 
in the case of comparing the coding density of essen- 
tial genes between the quasi-core and non-core regions. 
When the window size used in the analysis was reduced 
from 90-kb to 45-kb, thus doubling the sample size, the 
i'-values were reduced from more than one to < 0.05 
(Additional file 1: Table S2). In our opinion, these flexible 
categorizations are somewhat artificial; however, they are 
useful tools to infer different processes of evolution of a 
genus or of microevolution of a species. The quasi-core 
region(s) may represent the residue(s) of the complex evo- 
lution dynamics (vertical genomic recombination events) 
of the ancestral genome, while the non-core regions may 
represent the chromosomal expansions (horizontal gene 
transfer) in the various descendants' genomes. As more 
whole genome sequences of different strains of one spe- 
cies or different species of one genus, as well as those 
from closely related genera, are published, the biological 
implications of this genomic plasticity in bacterial phyl- 
ogeny will be clarified. 

Development of molecular chemotaxonomic 
characteristics (MCCs) for the genus of Amycolatopsis 

The taxonomic status of A. orientalis underwent the same 
revision history as that of A. meditermnei [24]; i.e., it was 
originally considered a streptomycete [32], then trans- 
ferred to Nocardia [33], and finally classified as a species 
of the newly established genus Amycolatopsis [34], which 
was typically defined by the biochemical characteristics of 
its cell wall (chemotype IV) and cell membrane (chemo- 
type II). As initiated in the study of the A. meditermnei 
U32 genome [24], in addition to the molecular genetic 
basis responsible for the components of arabinose, glycine, 
diaminopimelic acids and mycolic acids (Additional file 1: 
Table S3, Additional file 1: Table S4, Additional file 1: 
Figure S2, and Additional file 1: Figure S3), we attempted 
to analyze the previously unidentified genetic basis of 
two more chemotaxonomic phenotypes, i.e., phospho- 
lipids and menaquinones. 

The cell membrane of actinomycetes is classified into 
five types according to the presence of certain nitro- 
genous phospholipids [35]. The Amycolatopsis cell mem- 
brane belongs to the PII type because only one nitrogenous 
phospholipid, namely phosphatidyl ethanolamine (PE), 
was usually detected in its cell membrane [35]. In pro- 
karyotes, phosphatidylserine is first generated from 
CDP-diacylglycerol, a general intermediate for the synthe- 
sis of different types of phospholipids, catalyzed by 



phosphatidylserine synthase (PssA, EC: 2.7.8.8), and is 
then transformed into PE [36,37] by phosphatidylserine 
decarboxylase (Psd, EC: 4.1.1.65) (Figure 4A). Orthologs 
of both pssA (AORI_7346) and psd (AORI_7345) could be 
identified in the A. orientalis genome. These two genes 
also exist in other actinomycetes with a type PII cell mem- 
brane, such as A. mediterranei, N. farcinica, S. coelicolor, 
and M. smegmatis, but are absent in actinomycetes with a 
type PI membrane (no nitrogenous phospholipids) or 
other types of cell membranes. Moreover, in the genomes 
of neither A. orientalis nor A. mediterranei did we identify 
genes encoding phosphatidylcholine synthase (Pes, EC: 
2.7.8.24), which catalyzes the formation of phosphatidyl 
choline (PC), the characteristic type PIII phospholipid, 
or the genes encoding phosphatidylglycerophosphatase 
A (PgpA, EC: 3.1.3.27), which catalyzes the formation 
of phosphatidyl glycerol (PG), the characteristic type 
PV phospholipid (Figure 4A). It is worth mentioning 
that the gene eptl, which encodes ethanolamine phos- 
photransferase (EPTl, EC: 2.7.8.1) that catalyzes the 
biosynthesis of PE from 1, 2-diacylglycerol and CDP- 
ethanolamine in eukaryotes, is also absent in any of the se- 
quenced actinomycete genomes (Figure 4A). 

Isoprenoid quinones comprise a hydrophilic head and 
an apolar isoprenoid side chain, functioning mainly as 
electron and proton carriers in photosynthetic and respira- 
tory electron transport systems [38]. These compounds 
have also been used as conventional biomarkers in bacter- 
ial chemotaxonomy since the 1960s [39,40]. In the synthe- 
sis of isoprenoid quinones, isoprenyl diphosphate synthase 
(Isp) catalyzes the consecutive condensation of isopentenyl 
diphosphate (IPP) with allylic diphosphates and produces a 
variety of prenyl diphosphates with different chain lengths 
[38]. Previous studies reported that the specific amino-acid 
residues of isoprenyl diphosphate synthases attributable 
to the chain-length determination were designated the 
chain-length determination (CLD) region [41]. However, 
for the biosynthesis of isoprenoid quinones with longer 
chain lengths (more than C30), the consensus CLD region 
in isoprenyl diphosphate synthases has yet to be clarified 
[41]. In actinomycetes, menaquinone (MK) is the charac- 
teristic type of isoprenoid quinone in the cell membrane. 
We compared the amino-acid sequences (particularly the 
CLD region) of Isp from type strains of actinomycetes har- 
boring different-length MKs but no regular patterns could 
be found (Figure 4B). Hence, the isoprenyl diphosphate 
synthases were analyzed phylogenetically using the neigh- 
bor joining (NJ) method. As shown in Figure 4B, species 
with MK-7 (C35) in their membranes are clustered within 
one clade, whereas species with MK-8 (C40) or MK-9 
(C45) are clustered together and are indistinguishable in 
the tree. In addition, the maximum parsimony (MP) 
method was also used to construct a phylogenetic tree. 
However, the clustering result could not distinguish the 
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(See figure on previous page.) 

Figure 4 Biosynthetic pathways of different types of nitrogenous phospholipids in actinomycetes. (A) The cell membrane of 
Amycoiatopsis belongs to the type Pll because PE is the dominant phospholipid detected. Two essential proteins (AORI_7345 and 
AORI_7346, labeled in red color) involved in the biosynthesis of PE were encoded by the A. orientalis genome, whereas the genes 
encoding enzymes involved in other types of nitrogenous phospholipids were not found (NF). Actinomycetes of type PI contain no 
nitrogenous phospholipids in their cell membrane, while type Pll, type Pill, type PIV, and type PV actinomycetes contain the following 
characteristic phospholipids: PE, PC, GluNU, and PG, respectively. Panel (B) illustrates the analysis of isoprenyl diphosphate synthases from 
type strains of actinomycetes. The names and amino-acid sequences of the strains with different colors represent actinomycetes harboring 
different-length MKs: red, MK7 (C35); olive-green, MK8 (C40); blue, MK9 (C45). The amino-acid sequences of the chain-length determination (CLD) 
region are emphasized in green on the right of the panel. The protein sequences were obtained from NCBl at http//www.ncbi.nlm.nih.gov/protein/. 



species harboring MK-8 or MK-9 either (Additional file 1: 
Figure S4). Therefore, it is yet to be experimentally 
clarified whether the genotypes of isoprenyl diphosphate 
synthases, i.e., their sequence specificities, are a sufficient 
determinant for all different side chain lengths of isopren- 
oid quinone, or whether the variation of isoprenoid 
quinones in actinomycetes is a quantitative, rather than a 
qualitative, property that might be determined by gene ex- 
pression regulation or other post transcription/translational 
modifications. 

In summary, our sequential studies in two species of 
Amycoiatopsis (ref to [24] and this work) indicate that 
the chemotaxonomic characteristics of this genus, which 
relate to, but differentiate from Streptomyces and Nocardia, 
are intrinsically determined by the molecular phylogeny of 
their encoding genes. On the other hand, the failure to 
precisely determine the molecular genetic mechanisms 
underlying the chain-length of MK hinted at the complex- 
ity of these genotype/phenotype correlations. Together 
with some more important chemotaxonomic characteris- 
tics, such as the composition of fatty acids, these complex 
phenotypes and their related molecular genetic mecha- 
nisms may prompt further biochemical and molecular 
biological studies. Nowadays, we propose that, based on 
whole genome analysis of multiple bacterial strains be- 
longing to and related with a taxon (particularly, species 
or genus), potential molecular chemotaxonomic charac- 
teristics (MCCs) could be developed as the genotypes 
underlie the biochemical characteristics (phenotypes) of 
the taxon. The implementation of MCCs in bacterial sys- 
tematics will not only alleviate the tedious workload of 
chemotaxonomic identification, but also improve our 
understanding of the genetics of bacterial metabolomes, 
which will form an indispensable portion of the modern 
prokaryotic taxonomy in the era of genomics [42]. 

Biosynthesis of secondary metabolites and the post-assembly 
modifications of vancomycin \nA. orientalis 

Twenty-sbc secondary metabolite biosynthetic gene clus- 
ters were predicted in the complete genome of A orientalis 
HCCB 10007, including nine type I polyketide synthase 
(PKS) clusters, one type II PKS cluster, ten non-ribosomal 
peptide synthetase (NRPS) clusters, three hybrid PKS- 



NRPS clusters, two clusters for terpenoids, one cluster 
for lycopene {lye), and one cluster for |3-carotene {ear) 
(Figure 2). The total length of these gene clusters was esti- 
mated -552 kb, which is 6.2% of the whole genome. In 
contrast to the essential genes, most of the secondary me- 
tabolite biosynthetic gene clusters (18 out of 26) were in 
the non-core regions (Figure 2). 

To determine the possible phylogenetic relationships 
of the secondary metabolites biosynthesis gene clusters, 
all of the CDSs for PKSs, NRPSs, or terpene synthases 
were compared against the NCBI database via BLASTP. 
The best hits information is provided in Additional file 
1: Table S5. Twenty-seven genes in nine biosynthetic 
gene clusters (34.6% of the total 26 clusters) have ortho- 
logs in the A. mediterranei genome with the best hitting 
scores, i.e., car, pksl, lyc, and tps2 in the core region and 
pks3, tpsl, nrps7, pksS, and pks6 in the non-core regions. 
Furthermore, the nrps7, pksS, and pks6 gene clusters are 
closely located in the non-core region, particularly the 
pksS and pks6 clusters (Figure 2). These close correlations 
between sequence similarity and genomic loci gathering 
may indicate a common phylogenetic origin. 

Notably, among the eight gene clusters for secondary 
metabolism located in the core and quasi-core regions, 
except for four clusters {car, pksl, lyc, and tps2) ortholo- 
gous to those encoded in A. mediterranei genome, all of 
the other A. orientalis specific clusters {vcm, pks9, nrpslO, 
nrps4) are located in the break point of the chromosomal 
"X pattern" blocks. However, because of the small coding 
size of nrpslO and nrps4, only the vcm cluster (64 kb), 
located in the P2 break point, and the pks9 cluster 
(AORI_6587-6642, 61.7 kb), located in the P4 break 
point, could be traced in the "X pattern" blocks (Figure 3A). 
The KS domains of pks9 are similar to those of the sali- 
nosporamide A biosynthetic gene cluster in Salinispora 
tropica CNB-440 (73% identity) [43]. This cluster is 
rich in genes encoding modification enzymes, such as 
glycosyltransferases, halogenase, and cytochrome P450, 
which suggests the production of a glycosidic and halo- 
genic compound. 

In the non-core regions, cluster pks2 (AORI_2937-2956, 
79.6 kb) contains a type I polyketide synthase, which was 
once reported to synthesize a glycosidic polyketide ECO- 
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0501 that shows activities against MRSA and VRE [44]. 
For the other secondary metaboHte biosynthesis gene clus- 
ters in A. orientalis genome, their putative substrates or 
probable products were predicted using catalytic do- 
main analysis against the SBSPKS [45] or NRPSDB [46] 
databases and the results are listed in Additional file 1: 
Table S5. We isolated the total RNA of A. orientalis from 
two different cultures (fermentation medium Fl and 
nutrient medium F5, Additional file 1: Supplementary 
Materials and Methods), and used reverse-transcription 
PCR to detect the transcription profiles of the gene clus- 
ters that might synthesize potential secondary metabolites. 
As shown in the Additional file 1: Figure S5, in both Fl 
and F5 media, the genes of three clusters (pksS, n_p2, and 
vcm) showed significant levels of transcription, with vcm 
being the highest among all gene clusters tested. Another 
cluster (nrps4) was expressed in the Fl fermentation 
medium but not in the F5 medium. Although we failed to 
identify any novel secondary metabolites, our data pro- 
vides a foundation for further exploration. 

The vcm cluster was annotated to encode a total of 35 
enzymes (AORI_1471 to AORI_1505), including three 
vancomycin-resistance proteins (VanH, VanA, and VanX 
[7,8]), three large NRPSs, several post-assembly tailoring 
enzymes, and a series of biosynthetic proteins for the sup- 
ply of amino-acid precursors (Table 2). Different from the 
cep and bal clusters, in which three genes encoding glyco- 
syltransferases were predicted [7,8], only two glycosyl- 
transferases are encoded in the vcm cluster (AORI_1486 
and AORI_1487). On the other hand, the vancomycin- 
resistance genes vanHAX (AORI_1471-AORI_1473) are 
only predicted in vcm and not in the other two clusters 
(Table 2). Throughout the A. orientalis genome, we identi- 
fied another vanA (AORI_8112) and vanX (AORI_2227), 
as well as a two-component system (AORI_7254-AORI_ 
7255) similar to the vanSR of bal that may be related to the 
vancomycin resistance. 

Similar to the biosynthesis of balhimycin and chloroere- 
momycin, the biosynthesis of vancomycin includes three 
steps [17]. The related functional genes inside and outside 
of the vcm cluster were mapped to the A. orientalis gen- 
ome (Figure 5). First, seven amino-acid precursors, includ- 
ing one leucine, one asparagine, two p-hydroxytyrosine 
(L-(3Ht), two 4-hydroxyphenylglycine (L-Hpg) and one 
3, 5-dihydroxyphenylglycine (L-Dpg), need to be syn- 
thesized. Genes encoding the enzymes responsible for the 
biosynthesis of three non-protein amino acids were identi- 
fied in the genome, i.e., AORI_1492-AORI_1494 for L- 
PHt, AORI_1476, AORI_1491, AORI_1495-AORI_1496 
for L-Hpg, and AORI_1502-AORI_1505 for L-Dpg. 

Second, the seven precursor amino acids are assembled 
to form a heptapeptide backbone, which are catalyzed by 
the NRPSs VcmA (AORI_1478), VcmB (AORI_1479), and 
VcmC (AORI_1480). These three giant enzymes contain 



seven modules (M1-M7) with 24 domains that function in 
the selection, activation, condensation and epimerization 
of the amino-acid substrates [17]. In M2, M4, and M5, 
there are three epimerases (E domain) that convert L- 
PHt2, L-Hpg4, and L-Hpgs into the corresponding D type 
amino acids. The N-terminal amino acid of vancomycin is 
D-methylleucine [47]. However, neither an epimerase nor 
a dual condensation/epimerization domain [48,49] was 
observed in Ml or the adjacent C domain in M2. Rausch 
et al. conjectured that a racemase outside the vcm cluster 
might be responsible for the conversion of L-leucine into 
D-leucine, which can be incorporated directly into the gly- 
copeptides [48]. Throughout the whole genome, there are 
11 genes that potentially encode racemases, including six 
amino-acid racemases, three CoA racemases, and two 
mandelate racemases (see Additional file 1: Table S6). The 
recent genomic analysis of Vibrio cholera identified a 
novel PLP-dependent amino-acid racemase (vcl312) that 
was proven to be necessary and sufficient for the synthesis 
of the unusual D-amino acids, including D-leucine [50]. 
With vcl312 as the query sequence, we used the BLASTP 
program to search throughout the whole genome of 
A. orientalis. The results revealed one protein, annotated 
as an amino-acid racemase (AORI_0725), which has 28% 
amino-acid identity (48% positive) with vcl312 and that 
may function in D-leucine conversion. Further experimen- 
tal proof is required to confirm its involvement in vanco- 
mycin synthesis. 

The last step is the post-assembly modifications of the 
heptapeptide backbone, including its cyclization, halo- 
genation, methylation and glycosylation. Based on their 
corresponding genes in the cep and bal clusters [7,8], the 
functions of the modification genes in the vcm cluster 
were annotated (Table 2). The oxyA/B/C (AORI_1482- 
1484) genes likely encode three P450 monooxygenases 
that are responsible for closing the linear peptide to 
form the heptapeptide ring [19,51]. Adjacent to them, 
AORI_1485 {vhal) is predicted to encode a halogen- 
ase, showing 94% amino-acid sequence identity with that 
encoded by the bhaA in bal [9], which chlorinates the pHt 
residues. However, the exact timing of the chlorination is 
unknown, although it was proposed to occur before the 
oxidative couplings [52]. The methylation of D-leucine on 
the a-NH2 is catalyzed by a methyltransferase, which has 
been functionally characterized in the cep cluster [14]. Its 
orthologous protein in the vcm cluster was found and an- 
notated as Vmt (AORI_1490). Glycosylation is the last of 
the modifications and the functional glycosyltransferases 
for vancomycin biosynthesis have been well-characterized 
biochemically [10,15]. GtfE (AORI_1487) is responsible 
for the addition of the first TDP-glucose moiety to the 
4'-hydroxyl group of amino acid Hpg4, and the other 
glycosyltransferase GtfD (AORI_1486) adds the second 
TDP-L-p-vancosamine moiety to the 2'-hydroxyl group of 
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Table 2 Annotation of the vcm cluster In A. orientalis and the comparison with bal and cep 



AORLCDS 


Gene name 


Annotation 


Best hit genes (NCBI accession no.) 


Identity % 


AORI_1471 


vanH 


D-lactate dehydrogenase 


- 


- 


AORI_1472 


vanA 


D-alanine-D-alanine ligase 


- 


- 


AORL1473 


vanX 


D-alanyl-D-alanine dipeptidase 


_ 


_ 


AORI_1474 




Hypothetical protein 


- 


- 


AORI_1475 


vtr 


Regulator protein 


lcl|Y16952.3_cdsid_CAG25754.1 


83.33 


AORI_1476 


pdh 


Prephenate dehydrogenase 


lcl|Y16952.3_cdsid_CAG25755.1 


8412 


AORI_1477 




ATP-binding cassette, subfamily B 


lcl|AJ223999.1_cdsid_CAAl 1 793.1 


88.89 


AORL1478 


vcmA 


Non-ribosomal peptide synthetase 


lcl|AL078635.1_cdsid_CAB45052.1 


81.31 


AORL1479 


vcmB 


Non-ribosomal peptide synthetase 


lcl|AJ223999.1_cdsid_CAAn 795.1 


82.34 


AORI_1480 


vcmC 


Non-ribosomal peptide synthetase 


lcl|Yl 6952.3_cdsid_CAC48362.1 


85.81 


AORI_1481 


mbtH 


MbtH protein 


lcl]Yl 6952.3_cdsid_CAC48363.1 


89.86 


AORI_1482 


oxyA 


Cytochrome P450 


lcl|Y16952.3_cdsid_CAA76547.1 


86.19 


AORI_1483 


oxyB 


Cytochrome P450 


lcl|Y15952.3_cdsid_CAA76548.1 


87.44 


AORI_1484 


oxyC 


Cytochrome P450 


lcl|AJ223998.1_cdsid_CAAn 791.1 


91.69 


AORI_1485 


vhal 


Halogenase 


lcl|Y16952.3_cdsid_CAA76550.1 


93.89 


AORL1486 


gtfD 


Vancosaminyl transferase 


lcl|Y16952.3_cdsid_CAA76553.1 


59.93 


AORI_1487 


gtfE 


Glycosyl transferase 


lcl|Y16952.3_cdsid_CAA76552.1 


81.17 


AORI_1488 


vcaC 


Methyltransferase family protein 


lcI|Yl 6952.3_cdsid_CAC483641 


93.87 


AORI_1489 




LmbE family protein 


IcIlYl 6952.3_cdsid_CAC48365.1 


75.91 


AORI_1490 


vmt 


Methyltransferase 


lcl|AJ223998.1_cdsid_CAAl 1 779.1 


7321 


AORI_1491 


hpgT 


4-hydroxyphenylglycine aminotransferase 


lcl|AJ223998.1_cdsid_CAAl 1 790.1 


89.7 


AORL1492 


vhp 


Hydrolase 


lcl|AJ223998.1_cdsid_CAAn 784.1 


85.87 


AORI_1493 


vcmD 


Non-ribosomal peptide synthetase 


lcl|AJ223998.1_cdsid_CAAn 773.1 


83.57 


AORI_1494 


oxyD 


Cytochrome P450 


IcIlYl 6952.3_cdsid_CAC48370.1 


85.35 


AORI_1495 


hmaS 


4-hydroxymandelate synthase 


lcl|AJ223998.1_cdsid_CAAl 1 751 .1 


75.56 


AORL1496 


hmo 


4-hydroxymandelate oxidase 


lcl|Yl 6952.3_cdsid_CAC48372.1 


85.24 


AORL1497 




Antipoter 


lcl|Yl 6952.3_cdsid_CAC48373.1 


80 


AORI_1498 


vcaA 


Oxidase 


lcl|Yl 6952.3_cdsid_CAC48374.1 


88.04 


AORI_1499 


vcaE 


Reductase 


lcl]Yl 6952.3_cdsid_CAC4837S.l 


84.21 


AORI_1500 


vcaB 


Aminotransferase 


lcl|AJ223998.1_cdsid_CAAn 782.1 


88.62 


AORI_1501 


vcaD 


Epimerase 


lcl|Yl 6952.3_cdsid_CAC48377.1 


85.57 


AORI_1502 


dpgA 


Polyketide synthase 


lcl|AJ223998.1_cdsid_CAAn 755.1 


92.9 


AORI_1503 


dpgB 


Isomerase 


lcl|Yl 6952.3_cdsid_CAC48379.1 


75 


AORL1 504 


dpgC 


Thioesterase 


lcl|Yl 6952.3_cdsid_CAC48380.1 


83.25 


AORL1505 


dpgD 


Dehydration protein 


lcl|Y16952.3_cdsid_CAC48381.1 


85.49 



Note: "-" represents no orthologous gene was found in bal (Yl 6952.3) or cep (AL078635.1, AJ223999.1 and AJ223998.1) clusters. 



a glucose residue. AORI_1487 shows the highest amino 
acid sequence similarity to BgtfB encoded by bal (81%) 
and GtfB encoded by cep (81%), whereas AORI_1486 
shows the highest similarity to BgtfC encoded by bal (70%) 
and GtfC encoded by cep (69%). No glycosyltransferase cor- 
responding to BgtlA or GtfA, which add 4-e/?/vancosamine 
to the amino-acid residue of |3Htg in bal or cep, was found 
in the genome of A. orientalis. Therefore, there is no epi- 
vancosamine moiety present in vancomycin (Figure 1). 



To characterize the in vivo functions of the predicted 
halogenase, the putative methyltransferase and the glyco- 
syltransferases encoded by the vcm cluster, in-frame mono- 
genic mutants of AORI_1485 {vha[), AORI_1490 {vmt), 
AORI_1486 (gtp), and AORI_1487 (gtfE) are successfully 
constructed using a homologous recombination method 
similar to the PGR- targeting system (Methods). Various 
types of vancomycin derivatives, i.e., dechlorovancomycin, 
demethylvancomycin, desvancosamine vancomycin, and 
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Figure 5 Metabolic pathway of vancomycin biosynthesis. Three steps are involved in tlie biosynthesis of vancomycin, and the related functional 
genes in and outside of the vcm cluster were mapped. I) The biosynthesis of its amino-acid precursors (right of the panel). Non-ribosomal peptide 
synthetase VcmD {AORI_1493) catalyzes free tyrosines to form tyrosyl-S-enzyme, which is hydroxylated by OxyD (AORI_1494) and then release as PHt 
by the action ofVhp (AORI_1492). Genes of pdh/hpgT/hmaS/hmO (AORI_1476, AORI_1491, A0RI_1 495-1 496) are responsible for Hpg synthesis from 
prephenate, and dpgA/B/C/D (A0RI_1 502-AORI_1 505) are responsible for Dpg synthesis using malonyl-CoA as the starting unit II) The modified 
amino acids are assembled to form linear heptapeptide by NRPSs (VcmABC, A0RI_1 478-1 480) with seven modules (M1-M7, upper left of the panel). 
A, adenylation domain; C, condensation domain; E, epimerization domain; T, thiolation domain; TE, thioesterase domain. Ill) The post-modifications 
of the linear heptapeptide (down the left side of the panel) include cyclization [oxyA/B/C, AORI_1482-AORI_1484), halogenation [vhal, AORI_1485), 
methylation {vmt, AORI_1490), and glycosylation {gtfDE, AORI_1486-AORI_1487). Finally, vancomycin is generated. 



aglucovancomycin, which accumulated in the corresponding 
mutant cultures, were collected and their structures were 
confirmed by high-performance liquid chromatography- 
mass spectrometry (HPLC-MS) (Figure 6). Based on the 
results from the zone of inhibition test, desvancosamine 
vancomycin (Figure 6C), particularly dechlorovancomycin 
(Figure 6B), showed a lower bioactivity relative to that 
of vancomycin, whereas aglucovancomycin (Figure 6D) 
showed a slightly higher bioactivity than that of vanco- 
mycin. The bioactivity of demethylvancomycin (Figure 6F) 
was comparable to that of vancomycin. In addition, 
using demethylvancomycin or aglucovancomycin as the 
substrate, dimethylvancomycin (Figure 6G) or dimethyla- 
glucovancomycin (Figure 6E) were generated in vitro cata- 
lyzed by the heterogeneously expressed methyltransferase 



AORI_1490. Their molecular weights were also confirmed 
by the HPLC-MS spectrum, and the positions of the two 
methyl groups on the N terminus of leucine (Figure 1) 
were further examined using nuclear magnetic reson- 
ance (NMR) (Additional file 1: Table S7, Additional file 1: 
Table S8). Compared with that of vancomycin (Figure 6A), 
dimethylvancomycin showed a comparable antibacterial 
activity. Although dimethylaglucovancomycin (Figure 6E) 
is a novel compound, its activity was also similar to 
that of aglucovancomycin. Taken together, both methyla- 
tion and demethylation do not affect the in vitro antibacterial 
activity of vancomycin or its derivatives. For glycosylation, 
despite aglucovancomycin showing a slightly higher bio- 
activity than that of vancomycin in vitro, the in vivo 
activity was five-fold lower than that of vancomycin 
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Figure 6 Functional characterization and verification of the modification genes in the vcm cluster. The 64-kb vcm cluster is illustrated in 
detail. AORI_1490 (vmt), AORI_1486 (gtfD), AORI_1487 (gtfE), and AORI_1485 (vhal) were replaced in-frame by selection markers, and AORI_1490 
was overexpressed in vitro using demethylvancomycin/aglucovancomycin as the substrate. The vancomycin standards (A) and the corresponding 
variants obtained by isolation from mutant strains or the in vitro treatments were detected by HPLC-MS: (B) dechlorovancomycin, (C) desvancosamine 
vancomycin, (D) aglucovancomycin, (E) dimethylaglucovancomycin, (F) demethylvancomycin, and (G) dimethylvancomycin. The structural formulae 
of the variants are shown in the table of Figure 1. A mass of 20 [ig of each compound was used to assay its activity against MRSA, and the picture is 
representative of three independent experiments. 



[53], indicating that the sugar moiety may play an im- 
portant role in imparting enhanced pharmacokinetic 
properties. 

With the exception of dimethylaglucovancomycin, nearly 
all of the vancomycin derivatives mentioned above have 
been isolated naturally and their antibacterial activities 
reported [53] and our results are in agreement with 
these previous findings. However, this study provides, 
for the first time, in vivo functional characterization 
of the predicted halogenase, the putative methyltrans- 
ferase and the biochemically-characterized glycosyltrans- 
ferases in A. orientalis, along with a systemic analysis of 
the distinct bioactivities of different vancomycin variants. 
These in vivo analyses not only demonstrated that the 
vcm encoded halogenase and methyltransferase are func- 
tionally equivalent to those encoded in the bal and cep 
clusters, but also inferred that the modifications of 
halogenation, methylation, and glycosylation are not 
conducted exactly in series [17], because the vanco- 
mycin variants produced by each mutant were only 
deficient in their corresponding modification princi- 
pally (the only exception is aglucovancomycin of the 
gtfE mutant, from which both the glycosyl residues were 
absent). In other words, the tailoring enzymes (except 
GtfD) are not very specific, but have broad substrate spec- 
tra in vivo. 



Conclusions 

The genome of A. orientalis HCCB10007 is the first 
complete sequence for the bacteria that synthesize the 
vancomycin group antibiotics. Compared with the phylo- 
genetic closely related rifamycin-producing strain A. medi- 
termnei, A. orientalis has a relatively smaller chromosome 
and a more compact genomic organization. Their differ- 
ent configurations revealed possible chromosomal recom- 
bination events representing genomic plasticity related to 
either vancomycin biosynthesis or rifamycin biosyn- 
thesis. By comparison with other actinomycete genomes, 
the common features of the Amycolatopsis genomes and 
molecular chemotaxonomic characteristics (MCCs) repre- 
senting the phenotypes of phospholipid and mena- 
quinone for this genus were further identified and 
developed. In addition, the knockout of genes encod- 
ing the tailoring enzymes in A. orientalis was achieved, 
and the functions of the predicted halogenase and methyl- 
transferase annotated in the vcm cluster were charac- 
terized for the first time. The data provided by this 
study may facilitate the development of novel lead 
compounds for drug development, either through a 
combinatorial biosynthesis approach employing en- 
zymes with newly engineered modification activities, 
or using different vancomycin derivatives as the start- 
ing chemical moieties. 
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Methods 

Genome sequencing and assembly 

A orientalis strain HCCB10007 was deposited in the In- 
stitute of Microbiology of Chinese Academy of Sciences 
and designated CGMCC No. 6023. A traditional whole 
genome shotgun strategy using the Roche 454 GS FLX 
Titanium System was applied to sequence HCCB10007's 
genome. In total, 53 contigs with 8.9-Mb length were as- 
sembled from 561,423 reads (average length of 408 bp) 
by the Newbler Program (version 2.3) of the 454 suite 
package. The relationships between the contigs were deter- 
mined by referring to the A. mediterranei genome or using 
the ContigScape plugin [54], and the remaining gaps were 
filled using a multiplex PGR strategy. The final sequence 
assembly was performed using the phred/phrap/consed 
package (http://www.phrap.org/phredphrapconsed.html). 
Sanger-based sequencing was employed to facilitate the gap 
closure and to amend the low-quality regions (score < 60). 
Finally, a consensus sequence containing 8,948,591 bp 
(with an estimated error rate of less than 0.5 per 100,000 
bases) that provided 25.6-fold coverage was acquired. 

Genome annotation and analysis 

Putative protein- coding sequences were predicted based 
on the results from both Glimmer and Genemark. The 
BLASTP results obtained from the KEGG, NR, and CDD 
databases were used to annotate the CDSs and manual 
correction was also implemented. The tRNA genes were 
predicted directly with tRNAscan-SE vl.23. Essential 
genes were defined as those encoding proteins functioning 
in cell division, replication, transcription, translation, and 
amino-acid metabolism, with reference to the Clusters of 
Orthologous Groups (COGs) Database. Unless otherwise 
stated, the orthologous proteins between A. orientalis 
HCCB 10007 and other related species were defined by re- 
ciprocal BLASTP under the following conditions: mini- 
mum 30% identity and 30% length diversity. The coding 
density of all of the genes was defined as the ratio of the 
protein-coding sequences (CDSs) length to the total gen- 
omic length, whereas the coding density of essential genes 
(or orthologs) was defined as the ratio of sequence length 
of the essential genes (or orthologs) to the total CDSs in a 
corresponding non-overlapping sliding window. Statistical 
comparisons between core vs. non-core and quasi-core vs. 
non-core were estimated as P-values calculated by the 
grouped t test method with the statistical programming 
language R, employing two window sizes for analyses 
(except for the case of comparing the coding density of 
essential genes between quasi-core vs. non-core, P-values 
shown in the text used the 90-kb sliding window size in- 
stead of 45-kb). The MUMmer 3.0 Project was used to 
analyze the genome-wide co-linearity between A. orienta- 
lis HCCB 10007 and A. mediterranei. To characterize the 
molecular chemotaxonomic characteristics (MCCs), the 



biosynthetic pathways of arabinose, glycine, diaminopime- 
lic acids, mycolic acids, phospholipids, and menaquinones 
in A. orientalis and other actinomycetes were analyzed on 
http://www.kegg.ip/kegg/pathway.html. Literature search- 
ing, sequence alignment, domain comparison, and/or 
phylogenetic analysis were used to further identify the 
critical genes that determined the existence, conformation, 
and chain lengths of compounds. All of the BLASTP ana- 
lyses conducted with the MCCs used a threshold E value 
of le-3. The neighbor joining (NJ) method of the MEGA 
5.0 package was used to construct phylogenetic trees 
based on the 16S rRNA, Isp, and MurE sequences, and 
the reliability of each branch was tested by 1,000 bootstrap 
replications. For the Isp sequences, an additional max- 
imum parsimony (MP) method was used to obtain a more 
robust tree topology. The SBSPKS [45] and/or NRPSDB 
[46] databases were used to predict the probable sub- 
strates or products for secondary metabolite biosynthetic 
gene clusters. 

Construction of monogenic mutations of AORI_1485 
(vhal), AORI_1490 (vmt), AORI_1486 (gtfD) and AORI_1487 
(gtfE) in A. orientalis 

A homologous recombination method similar to the 
PCR-targeting system applied in Streptomyces was used 
to mutate the vhal, vmt, gtfD, and gt/E genes in A. orientalis 
HCCB10007. First, a cosmid library of A. orientalis gen- 
omic DNA (containing inserts of about 40Kb in length) 
was constructed using SuperCos 1 Cosmid Vector Kit 
(Agilent Technologies, Inc.). The cosmid clone XL0311, 
which contained the AORI_1485, AORI_1490, AORI_1486, 
and AORI_1487 genes, was selected by Southern blotting 
and further used to knockout vhal, vmt, gtfD, and gtfE, 
individually. All of the target genes in XL0311 were re- 
placed precisely with the apramycin-resistance gene, using 
two long recombinational primers (39 nt). The cosmids 
with mutated target genes were introduced into E. coli 
BW25113/pIJ790 (X RED recombination plasmid) and 
then conjugated into the A. orientalis recipients. The cor- 
rect exconjugants/knockout clones were selected based 
on their apramycin resistance (50 ^ig/ml) on MS medium 
(mannose 20 g/L, soybean flour 20 g/L, agar 20 g/L, and 
10 mmol/L MgCl2) supplemented with nalidixic acid 
(20 |ig/ml) for counter selection. The genes inactivated by 
homologous recombination were further confirmed by PCR 
(for primers, please see Additional file 1: Supplementary 
Materials and Methods). 

Vmt expression, purification, and in vitro modification 
assay 

The vmt gene from the A. orientalis HCCB10007 genomic 
DNA was cloned into a pET30a vector and transformed 
into BL21 (DE3) cells. The expression of Hise-tagged Vmt 
was induced by 1 mM isopropyl-l-thio-p-D-galactoside 
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(IPTG) at 30°C for 4 h, and then nickel- nitrilotriacetic 
acid (Ni-NTA) affinity chromatography (Qiagen, Valencia, 
CA, USA) was used to purify the protein. The in vitro 
methylation modification system contained 2 mM (5)- 
adenosyl-L-methionine (Sigma-Aldrich Canada), 10 mM 
Hisg-Vmt (3 mg), 500 mM substrate (demethylvancomy- 
cin or aglucovancomycin), and 50 mM Tris-HCl (pH 7.5) 
in a total volume of 1 ml. The reaction was conducted at 
25°C for 24 h. The reaction was stopped by the addition of 
an equal volume of cold methanol, incubated at -20°C for 
20 min, and centrifuged for 5 min at 10,000 x g. HPLC- 
MS was used to analyze the supernatant. 

HPLC-MS analyses 

The vancomycin derivatives were prepared from ultra- 
sonically lysed suspensions of the culture pellets, and the 
cell debris was removed by centrifugation and filtration. 
HPLC-Q-TOF-MS (Waters Micromass Q-TOF Premier 
Mass Spectrometer) analysis was then used to identify 
the derivates. HPLC was performed at 40°C using an 
ACQUITY HPLC BEH C18 column (100 mm x 2.1 mm, 
i.d.: 1.7 |im; Waters Corp., MUford, USA) equipped with an 
ACQUITY HPLC VanGuard PreColumn (5 mm x 2.1 mm, 
i.d.: 1.7 |im; Waters Corp., Milford, MA, USA). Solvent A 
(0.05% TFA in water) and solvent B (0.05% TFA in aceto- 
nitrile) were used as the mobile phase, with a flow rate of 
0.4 ml min The following gradient was used: t = 0 min: 
5% B; t = 2.2 min: 15% B; t = 4.5 min: 30% B; t = 12.5 min: 
99% B. The mass spectrometer detected all of the samples 
at a wavelength of 240 nm. 

Zone of inhibition test 

This test was conducted using Staphylococcus aureus 
cultured in LB medium as indicator cells. The soft top 
agar of the test agar plate consisted of 10 gL"^ tryptone 
extract, 5 gL"^ yeast extract, 5 gL"^ NaCl, and 16 gL"^ 
agar, and the indicator cells were added into the soft 
agar at a final concentration of 10^ cfu/ml. Then, 20 i^g 
of each vancomycin variant was carefully dropped onto 
the drug- sensitive slips. The slips were then placed on the 
center of the agar plates. Observations were made after 
20 hours at 37°C. Three independent experiments were 
performed and the representative pictures were chosen 
for the figure. 

Nucleotide sequence accession numbers 

The nucleotide sequences of the chromosome and plas- 
mid were deposited in the GenBank database under acces- 
sion numbers [CP003410] and [CP003411], respectively. 

Availability of supporting data 

All the supporting data named as "Additional files for 
the Amycolatopsis orientalis genome paper", were depos- 
ited in an open access repository of CHGC (Chinese 



National Human Genome Center at Shanghai) database. 
Please refer to http://chgc.sh.cn/ch/Ao.htnil. The phylogen- 
etic trees based on the 16S rRNA, Isp, and MurE sequences 
were deposited in Treebase database (www.treebase.org). 
Please refer to http://purl.org/phylo/treebase/phylows/study/ 
TB2:S15601. 

Additional file 



Additional file 1: Figure SI. (A) Phylogeny tree based on 16S 
ribosome RNA of selected actinobacteria and other related species. 
(B) Comparative analyses of tite ortlnologs between different actinomycete 
genomes. Table SI. Comparative analysis of the 16S ribosome RNAs 
between and in A. orientalis and A. mediterranei genomes. Table S2. The 
P-values derived from grouped t test for the coding densities of orthologs 
or essential genes comparing the core (or Rl, or R2) region against the 
non-core regions under the conditions of different sliding window sizes. 
Table S3. Enzymes in different actinomycetes involved in the pathway of 
incorporating arabinose into the cell wall. Table S4. Genes characterized in 
different actinomycetes responsible for recruiting glycine residues 
crossbridging to the peptidoglycan lateral chains. Figure S2. Pylogenetic 
analyses of IVlurE in actinomycetes. Figure S3. Genetic organization of 
the fadD-pl<s-accD and fas-l gene clusters in 20 selected actinobacterial 
genomes. Figure S4. phylogenetic analysis of isoprenyl diphosphate 
synthases from type strains of actinomycetes using the IVIP method. 
Table S5. Orthologs of secondary metabolite genes in A.orientalis 
HCCBl 0007 genome compared to the NCBI database. Figure S5. The 
reverse-transcription PCR of RNA isolated from different cultures. Table S6. 
Genes encoded for racemases in A.orientalis HCCBl 0007 genome. Table S7. 
NMR spectroscopic data for dimethylvancomycin. Table S8. NMR 
spectroscopic data for dimethylaglucovancomycin. 
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